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Abstract — In this paper we consider the problem of 
locating a nonzero entry in a high-dimensional vector 
from possibly adaptive linear measurements. We consider 
a recursive bisection method which we dub the compressive 
biliary search and show that it improves on what any 
nonadaptive method can achieve. We establish a non- 
asymptotic bound that applies to all methods, regardless 
of their computational complexity. Combined, these results 
show that the compressive binary search is within a double 
logarithmic factor of the optimal performance. 

I. Introduction 

How should one approach the problem of finding a 
needle in a haystack! Specifically, suppose that a high- 
dimensional vector X e M" is known to have a single 
nonzero entry — how can we efficiently find the location 
of the nonzero? We will assume that we can learn about 
X by taking m noisy linear measurements of the form 

= (ai,x) + Zj, i = l,...,m, (1) 

where the measurement vectors ai , . . . , have Eu- 
clidean norm at most 1 and zi, . . . , z„j are i.i.d. ac- 
cording to Af{0, 1). Our question reduces to the problem 
of choosing the vectors ai , . . . a„i and constructing an 
algorithm to estimate the location of the nonzero from 
the measurements j/i, . . . , y,n- 

This is a special case of support recovery in compres- 
sive sensing (CS) [1, 2], since (1) is equivalent to the 
linear model 

y = Ax + z, (2) 

where y — (j/i, . . . , y,n), A is the m x n matrix with 
row vectors ai,...,a,„ and z = (zi,...,Zm). (Note 
that the rows of A are normalized, as opposed to the 
columns, which is another common convention in the 
CS literature.) There are a variety of results on support 
recovery in the context of (2) where the measurement 
matrix A is fixed in advance (i.e., is nonadaptive) 
and satisfies certain desirable properties [3-9]. As an 
example, one can show that if A is generated by drawing 
i.i.d. il/y'n (symmetric) entries and the signal x is 
1 -sparse with nonzero entry equal to /i > 0, then the 
Lasso and Orthogonal Matching Pursuit (OMP) recover 
the support of x with high probability provided that 

/Lt > C\/ {n/m) logn, (3) 



with C sufficiently large. Moreover, any method based 
on such measurements requires ji to satisfy this lower 
bound for some constant C > 0. This is essentially the 
whole story when the measurements are nonadaptive. 

In contrast, suppose now that the system implementing 
(1) can provide feedback in such a way as to allow for 
the measurements to be taken adaptively, meaning that 
ai may be chosen as a function of the observations up 
to time i — 1, that is, (j/i, . . . , (This implicitly 

assumes that a^ is a deterministic function of this vector, 
but there is no real loss of generality in this assumption.) 
This instance of active or online learning has received 
comparatively far less attention to date. However, in 
recent work [10] we have established information bounds 
showing that any support recovery method under any 
adaptive sampling scheme (satisfying the conditions 
above) will be unable to recover the correct support 
unless the nonzero entry satisfies 

PL > Cy^n/m, (4) 

for some constant C > 0. 

Our contribution in this paper is twofold. In Section II, 
we propose a compressive binary search algorithm which 
recursively tests whether the nonzero entry is on the left 
or right half of the current interval. We show that the 
method reliably recovers the support of a 1 -sparse vector 
when the nonzero entry satisfies 

fi > C V {n/m) log log2 n, (5) 

with a constant C > 2. We then verify this analysis via 
numerical simulations. Note that by using an adaptive 
measurement scheme we are able to improve upon the 
requirement in (3) by reducing the logrt to loglog2n, 
but our scheme does not eliminate the logarithmic factor 
entirely as in (4). A corollary of this result is that in 
contrast to the results of [10], which argued that in gen- 
eral adaptive strategies do not improve over nonadaptive 
strategies in terms of our ability to accurately recover 
X, we see that when /i satisfies (5), adaptive strategies 
can significantly outperform nonadaptive ones by first 
identifying the location of the nonzero and then reserving 
a set of measurements to more accurately estimate the 
value of the nonzero. 



In contrast to this upper bound, in Section III, we 
provide a simple proof that fi > C^/nJrn is necessary 
for any method to work. This novel proof is in some 
sense tailored to this binary method as it is based on 
testing whether the nonzero component is in the left or 
right half of x. In Section IV, we discuss related work 
in more detail and directions for future work. 

II. Compressive Binary Search 
A. The algorithm 

The algorithm is designed assuming that the target 
vector X has exactly one nonzero entry equal Xo ji > 
0; both the location and magnitude are unknown. The 
methodology described here can be easily adapted to the 
case where the sign of the nonzero entry is unknown. 
For simplicity, we assume that n is dyadic, and let sq = 
log2 n, where logj denotes the logarithm in base 2. 

With a budget of m > 2 log2 n measurements of the 
form (1), the binary search method proceeds as follows. 
We divide our m measurements into a total of sq stages, 
allocating rus measurements to stage s, where 



1, 



[(m-so)2-^J , (6) 



where [aj denotes the largest integer not greater than 
a. Note that we do not exceed our total measurement 
budget since 



So 

rris = So + nig < Sq + (to — Sq) 2^** < to. 



s=l 



We also have > 1 for all ,s, which is necessary for our 
algorithm to be able to run to completion. Starting with 



J, 



.1) 



{!,..., n}, at stage s = 1, 

7(S) 



■ •So- 



we have a 



dyadic interval Jq and consider its left and right halves 
denoted j}*' and Jj"''. For example, :={!,..., |} 
and J2^^ :— + 1, . . . , n}. Let u'") denote the vector 
with entries indexed by jj'*'' equal to 2^(*''-s+i)/2 ^jj^j 
with entries indexed by Jj'''' equal to — 2^(*°^*+^)/^. 



Note that ||u(")|| = 1, since = IJ^"-"] = 2^«-^ We 

measure to^ times with u^'*', meaning that we observe 



i^)^(u(^),x)+z(^\ ^ = 1,...,TO,. 



Based on these measurements, we decide between going 
left or right, meaning we test whether the nonzero entry 

(s) (s) 

is in Jj; or J2 ■ We do so by simply computing 



is) 



Specifically, we set 4"+^^ = if uM^ > 0, and 



Jq"^^^ = Jj"'' otherwise. 



B. Performance analysis 

The binary search improves on methods based on non- 
adaptive measurements by by weakening the requirement 
(3) to (5). 

Theorem 1. In our setting, with a single nonzero entry 
equal to jj, > Q and a measurement budget of m > 
2 log2 n, the probability that binary search fails to locate 
the nonzero entry (denoted Pgj satisfies 



^ log2"- 

<^exp 



8n 



(7) 



Proof: Since the binary search algorithm is equiv- 
ariant with respect to the ordering of the entries, we 
can begin by assuming without loss of generality that 
X = (/X, 0, ... , 0)^, i.e., the nonzero is located in the 
first entry of x. Thus, we can use a simple union bound 
to argue that 



So 

< Vp(u;('') < 0), 



s=l 



(8) 



where = {1, . . . , 2 ^n} and J, 



(s) 



{2- 



1, ... 2^ *n}. Under our assumptions, we have that 

z«(^)^AA(^2(^-i)/2!^,to,^. 



Thus we can bound 



P(w(^) < 0) = $ 




(9) 



since for all t > we have 



1 



:= P(A/'(0, l)>t)<^ cxp(-tV2). 
We next note that by construction, 

TOs + 1 > (to — Sq)2~^. 

Since to > 2sq, we have that m — > m/2, and hence 
we obtain 

ms2'' > {rhs + 1)2" > to - sq > m/2. 
Plugging ms2* > 111/2 into (9), we obtain 



<0) 



< 



1 



exp 



/i^TO 



Plugging this into (8) we arrive at 



, So 

<yexp 



8n 



as desired. ■ 
Note that we need (5) with C > 2\f2 for the upper 
bound on Pg in (7) to actually tend to zero as n increases. 
However, by taking additional measurements beyond the 
2 log n required by this theorem, we could loosen this 
requirement to be able to set C arbitrarily close to 2. 
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Fig. 1. Comparison between compressive binary search and OMP as 
a function of fj, for n = 4096 and m = 256. Observe that compressive 
binary search can successfully identify the location of the nonzero for 
weaker values of fi than OMP, but still requires fj, > ^Jripm = 4. 



C. Numerical experiments 

To validate our theory, we perform some simple 
numerical experiments. Specifically, we compare the per- 
formance of the compressive binary search procedure to 
that of OMP (with A constructed with random ±.l/^/n 
entries). Note that in the 1 -sparse case, OMP simply 
reduces to identifying the column of A most highly 
correlated with the measurements y. The performance 
of these two algorithms is shown in Figure 1, which 
shows the empirical probability of error as a function 
of fjL computed by averaging over 100, 000 trials. For 
these experiments, we set n — 4096 and m = 256. 
Note that for these values of n and to, we have that 
^njm, = 4 and yjin/m) log log2 n w 6.3. Thus, 
ignoring the constant terms in (4) and (5), we see that 
the performance of the compressive binary search is 
largely consistent with our theory — namely, it cannot 
reliably identify the location of the nonzero when /i < 4 
but can for p, > 6.3. Moreover, recall that as noted 
in (3), the nonadaptive OMP algorithm requires that /i 
exceed (n/m) log n w 11.5 to succeed. Again ignor- 
ing constants, in our case this corresponds to requiring 
fi to be roughly 1.8 times larger than is required for 
the compressive binary search procedure, and this is 
precisely the behavior we observe in Figure 1. 

III. Lower Bound: Left or Right? 

We now establish an explicit, non-asymptotic lower 
bound for adaptive support recovery, valid for any recov- 
ery method based on adaptive measurements satisfying 
the conditions required here. Though such bounds were 
recently derived in [10], we provide a much simpler 
proof here for the case of 1 -sparse signals that closely 



aligns with the core idea of the compressive binary 
search. 

Let — {yi, . . . , Ui) denote the information avail- 
able after taking i measurements. Let Px denote the dis- 
tribution of these measurements when the target vector 
is X. Without loss of generality, we assume that is a 
deterministic function of y[i_i]. In that case, using the 
fact that Hi is independent of y[i-i] when is given, 
we have 

m 

vAy[m]) = l[PMa^)■ (10) 

For a subset K C {I, . . . ,n}, let K'^ := {I, . . . ,n} \ K 
and let xk be the part of x indexed by K. 

Let ||P — QIItv denote the total variation metric 
between distributions P and Q, and K{V,Q) their 
Kullback-Leibler divergence [11], related by Pinsker's 
inequality 

llP-Qll^v < ^WQ)- (11) 

Lemma 1. Suppose that n is even and let Ji — 
{1,. . .,n/2} and J2 = {n/2 + 1, ...,n}. For r = 1,2, 
let TTj, denote the uniform prior on the vectors x G M" 
having a single nonzero entry equal to 11 > 0, located in 
Jr. Let Vr denote the distribution of when x ^ tt^. 
Then 

||P2-Pl||^V<^- 

Proof Let Pq denote the distribution of when 
x = 0, which is multivariate normal with zero mean and 
covariance I. Using Pinsker's inequality (11), we have 

||P2-Pi||tv < 2||Po-Pi||^v + 2||Po-P2||tv 

< A'(Po,Pl) + if(Po,P2). 

Let P(j) denote the distribution of ?/[„] when the nonzero 
entry (equal to /i) is at j e {1, . . . By the law of 
total probability, 

n ^ — ' 

and obviously 

Po = - V Po, 
n ^-^ 

jeJi 

which allows us to use the convexity of the KL diver- 
gence [12], to obtain 

i^(Po,Pi)<-5]if(Po,Po-)). 



Under P^-), yi 
that 



z, , while under 



Zj, so 



i^(Po,P(,)) = -Eolog 



(i) 



V-^ /I. ^2 1 

1=1 ^ 

771 



The first line is by definition; the second and third are 
consequences of (10) and the definition of the normal 
likelihood; the fourth line is because, under Pq, yi is 
independent of ai j and has zero mean. Hence, 

9 m 



and similarly, 
so that 



< 



2^ 



2 ^ 



/ y «j' 
jeJ2 



i^(Po,Pi) + if(Po,P2) < — Ei^"E 

i=l i=l 

since ||aij| < 1 for all i. ■ 

Lemma 1 implies an information bound on deciding 
whether a vector x e M" with a single nonzero entry 
equal to /i is supported on the first half or second half of 
the index set {1, . . . , n}. Proving this result by directly 
looking at the likelihood ratio, which is the standard 
approach to deriving such a result, seems quite delicate 
as we are testing a mixture (supported on the first half) 
versus a mixture (supported on the first half). 

Theorem 2. In the setting of Lemma 1, consider testing 
Hi versus H2, where Hr is the hypothesis by which x is 
supported in Jr- Then under the uniform prior, for any 
test T, 

fir fails) > 1 - ^^m/n. 

Note that the lower bound is also valid in the minimax 
sense. In fact, the uniform prior is least favorable by 
invariance consideration [13, Sec. 8.4]. 

Proof: Under the uniform prior, we are effectively 
testing Pi versus P2. It is well-known that the likelihood 
ratio test, which rejects when i > 1, with L := P2 / Pi, 
has minimum risk, equal to 



^(L<l)+P2(i>l) = l-i||P2 



IIITV- 



We then apply Lemma 1 to bound the total variance 
distance on the RHS. ■ 



IV. Discussion 

Our main results can be cast in the following terms: 
Theorem 1 implies that, with probability at least 1/2, the 
binary search method succeeds at locating the nonzero 
entry when 

> — Sloglogan, 
m 

while Theorem 2 shows that all tests for deciding 
whether the nonzero entry is the first or second half of 
the index set {1, . . . , n} fail with probability at least 1/2 
when 

, n 
-. 

m 

Clearly, the bounds do not match. Numerically, for 
n < 10^, loglog2n < 3, in which case the discrepancy 
is a multiplicative factor of 24. It would be interesting to 
know whether these bounds can be tightened, and par- 
ticularly, whether there are algorithms that outperform 
the compressive binary search in a substantial way. 

While the problem of detecting the support of a 1- 
sparse vector might seem to have only limited applica- 
tions, in fact one can extend any algorithm that identifies 
the support of a 1 -sparse vector to one that works for 
vectors with k > 2 nonzero entries. This can be done 
by first exploiting a simple hashing scheme which will 
(with high probability) isolate each nonzero, and then 
applying the method for 1 -sparse vectors to each hash 
separately. For an overview of this approach in a similar 
context, see [4]. 

We also note that [4] independently proposes a method 
very similar to the compressive binary search approach 
we describe. Though [4] considers a different setting 
with continuous signals (instead of vectors as we do), 
the method proposed is essentially the same, except 
that the measurement budget is partitioned differently. 
In particular, it is not obvious to us that the strategy 
in [4] will always succeed, since it does not account 
for rounding effects or enforce that a base number of 
measurements are reserved for each scale (stage) and 
so (to the best of our understanding) the method might 
exhaust its measurement budget before the algorithm 
terminates. Another key difference is that by considering 
the simpler setting of a vector in M", we can significantly 
simplify the analysis. That being said, the conclusions 
of [4] are broadly similar to our own. 

Finally, we also note that there a few other adap- 
tive algorithms that have been proposed in this setting. 
For example, the Compressive Distilled Sensing (CDS) 
algorithm proposed in [14] considers a CS sampling 
algorithm which performs sequential subset selection via 
the random projections typical of CS. In a different 
direction, [15, 16] suggest Bayesian approaches where 
the measurement vectors are sequentially chosen so as to 
maximize the conditional differential entropy of y.^ given 



{yi, . . . ,yi-i). While it remains a challenge to obtain 
performance bounds for the Bayesian methods suggested 
in [15, 16], CSD is analyzed in detail in [14] for the task 
of estimating a fc-sparse vector x. Following the proof 
with a view on support recovery, one can establish that 
CDS is reliable in our context when 

H > Cn\/n/m, 

with C„ — ?► oo arbitrarily slowly, coming extremely close 
to the lower bound of (4). However, the algorithm seems 
to require that m > n" for a constant a > fixed, while 
binary search only requires m> 2 log2 n. An important 
question is whether there exist methods which allow for 
both small m and fi approaching the bound in (4). 
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